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pg ' Abstract 



We study the problem of answering k-way marginal queries on a database D G ({0, l}"*)", 
while preserving differential privacy. The answer to a fc-way marginal query is the fraction of 
the database's records x G {0, 1}^ with a given value in each of a given set of up to k columns. 

Cn ■ Marginal queries enable a rich class of statistical analyses on a dataset, and designing efficient 

algorithms for privately answering marginal queries has been identified as an important open 
problem in private data analysis. 

^Q ' For any fc, we give a differentially private online algorithm that runs in time 

Q 



{exp (di"f^(i/v^)) ^ exp {d/ log-99 d) } 



per query and answers any (possibly superpolynomially long and adaptively chosen) sequence of 
fc-way marginal queries up to error at most ±.01 on every query, provided n > d^^. To the best 
^ , of our knowledge, this is the first algorithm capable of privately answering marginal queries with 

^^ ' a non-trivial worst-case accuracy guarantee on a database of size poly(d, k) in time exp{o{d)). 

^*^ , Our algorithms are a variant of the private multiplicative weights algorithm (Hardt and 

Rothblum, FOGS '10), but using a different low-weight representation of the database. We 
derive our low-weight representation using approximations to the OR function by low-degree 
polynomials with coefficients of bounded Li-norm. We also prove a strong limitation on our 
approach that is of independent approximation-theoretic interest. Specifically, we show that for 
any k — o(logd), any polynomial with coefficients of Li-norm poly((i) that pointwise approxi- 
mates the d-variate OR function on all inputs of Hamming weight at most k must have degree 
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1 Introduction 

Consider a database D G ({0, 1} )" in which each of the n(= \D\) rows corresponds to an individ- 
ual's record, and each record consists of d binary attributes. The goal of privacy-preserving data 
analysis is to enable rich statistical analyses on the database while protecting the privacy of the 
individuals. In this work, we seek to achieve differential privacy [ DMNSOG] . which guarantees that 
no individual's data has a significant influence on the information released about the database. 

One of the most important classes of statistics on a dataset is its marginals. A marginal query 
is specified by a set S C [d] and a pattern t G {0, l}'"^'. The query asks, "What fraction of 
the individual records in D has each of the attributes j £ S set to tj?" A major open problem 
in privacy-preserving data analysis is to efficiently create a differentially private summary of the 
database that enables analysts to answer each of the 3 marginal queries. A natural subclass of 
marginals are k-way marginals, the subset of marginals specified by sets S C [d] such that l^l < k. 

Privately answering marginal queries is a special case of the more general problem of privately 
answering counting queries on the database, which are queries of the form, "What fraction of indi- 
vidual records in D satisfy some property g?" Early work in differential privacy }DN03tlBDMN05"| 
IDMNS06] showed how to privately answer any set of counting queries Q approximately, and yet with 
good accuracy (say, within ±.01 of the true answer), by perturbing the answers with appropriately 
calibrated noise, provided \D\ > |Q|^'^. 

However, in many settings data is difficult or expensive to obtain, and the requirement that 
1^1 ^ |Qn ^ is too restrictive. For instance, if the query set Q includes all k-way marginal queries 
then \Q\ > d^ ', and it may be impractical to collect enough data to ensure \D\ > |Qp^, even 
for moderate values of k. Fortunately, a remarkable line of work initiated by Blum et. al. |BLR08] 
and continuing with |DNR+09[lDKOT[RRMlHR10llmlTT2llGRU12llJTl^ . has shown how to 
privately release approximate answers to any set of counting queries, even when \Q\ is exponentially 
larger than \D\. For example, the online private multiplicative weights algorithm of Hardt and 
Rothblum [HRlOj gives accurate answers to any (possibly adaptively chosen) sequence of queries Q 
provided |-D| > V"log|Q|. Hence, if the sequence consists of all k-way marginal queries, then the 
algorithm will give accurate answers provided \D\ > k\/d. Unfortunately, all of these algorithms 
have running time at least 2*^ per query, even in the simplest setting where Q is the set of 2-way 
marginals. 

Given this state of affairs, it is natural to seek efficient algorithms capable of privately releasing 
approximate answers to marginal queries even when \D\ <^ d . The most efficient algorithm known 
for this problem answers all k-way marginal queries in time d^^^^> and answers every conjunction 
query to within ±.01 provided \D\ > (i'^(^) |TUV12j l^ 

Even though \D\ can be much smaller than |Qp^, a major drawback of this algorithm, and 
other efficient algorithms for releasing marginals (e.g. |GHRUlll[CKKL12l[HRST2llFK13] ) is that 
the database still must be significantly larger than Q{k\/d), which we know would suffice for 
inefficient algorithms. Recent experimental work of Hardt et al. |HLM12J demonstrates that for 
some databases of interest, even the 2 -time private multiplicative weights algorithm is practical, 
and also shows that more efficient algorithms based on adding independent noise do not provide 
good accuracy for these databases. Motivated by these findings, we believe that an important 
approach to designing practical algorithms is to achieve a minimum database size comparable to 
that of private multiplicative weights, and seek to optimize the running time of the algorithm as 
much as possible. In this paper we give the first algorithms for privately answering marginal queries 



^More precisely, the algorithm in |TUV12) runs in time d ^ ' and releases a summary from which an analyst can 
compute the answer to any fc-way marginal query in time d ' \ 
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Running Time per Query 
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Table 1: Summary of prior results on differentially private release of /c-way marginals with error 
lb. 01 on every marginal. Note that the running time ignores dependence on the database size, 
privacy parameters, and the time required to evaluate the query non-privately. 

for this parameter regime. 



1.1 Our Results 

In this paper we give faster algorithms for privately answering marginal queries on databases of 
size 0(d'^^/e), which is nearly the smallest a database can be while admitting any differentially 
private approximation to marginal queries Q 

Theorem 1.1. There exists a constant C > such that for every k,d,n €z N, k < d, and every 
£,6 > 0, there is an (£,6) -differentially private online algorithm that, on input a database D G 
({0, l}'^)", runs in time 

min {exp (d^-^l(^^\ , exp {d/ log'^^ d) } 

per query and answers any sequence Q of (possibly adaptively chosen) k-way marginal queries on 
D up to an additive error of at most ±.01 on every query with probability at least 0.99, provided 
that \D\ = n{d-^^ log |Q| log(l/5)/e). 

See Table 1 for a comparison of relevant results on privately answering marginal queries. 

When k is much smaller than d, it may be useful to view our algorithm as an offline algorithm for 
releasing answers to all k-way marginal queries. This offline algorithm can be obtained simply by 
requesting answers to each of the d^ ' distinct k-way marginal queries from the online mechanism. 
In this case we obtain the following corollary. 

Corollary 1.2. There exists a constant C > such that for every A;, (i, n G N, k = 0{d/ log d), and 
every £,5 > 0, there is an {£,5) -differentially private online algorithm that, on input a database 

De{{o,iYr, 



runs in time 



mm 



{exp (d^-i/^v^) , exp {d/ log-99 d) } 



per query and, with probability at least 0.99, releases answers to every k-way marginal query on D 
up to an additive error of at most ±.01, provided that \D\ = Q,(kd^^ log(l/5)/e). 

We make a few additional remarks about our results: 
Remark 1. When k = ri(log d), the minimum database size requirement can be improved to 
\D\ > Ckd-^^°^^> \og{l/5)/£ (for some universal constant C > 0), but we have stated the theorems 
with a looser bound for simplicity. Here the o(l) means as a function of d. 



^There is no differentially private algorithm that answers even 1-way marginal queries with non-trivial accuracy 
on a database of size o(v'd/logd) [U1112LiVadj 



Remark 2. Our algorithm can be modified so that instead of releasing approximate answers to 
each k-way marginal explicitly, it releases a summary of the database of size 0{kd'^^) from which 
an analyst can compute an approximate answer to any k-way marginal in time 0{kd^'^^). 

A key ingredient in our algorithm is a new approximate representation of the database by a 
low-degree polynomial of low weight. Our approximate database representation relies upon the 
construction of a low-degree polynomial of weight poly(d) that approximates the d-variate OR 
function on all inputs of Hamming weight at most k. Constructing such a polynomial of lower 
degree and similar weight would immediately yield a faster data release algorithm with a similar 
accuracy guarantee. Unfortunately, we prove a lower bound showing that no such polynomial exists. 
This lower bound may be of independent approximation-theoretic interest. 

Theorem 1.3. Let ORd : {—1,1}°' — t- {—1,1} denote the OR function on d variables, and for 
any vector x S {—1, 1} , let \x\ denote X]i=i '^{xi=-i}> the number of coordinates of x equal to —1. 
Fix k = o{\ogd), and let p be a real d-variate polynomial satisfying \p{x) — ORd{x)\ < 1/6 for all 
X G {—1, l}'' with \x\ < k. If the sum of the absolute values of the coefficients of p is bounded by 
d'^^^\ then the degree of p is at least d^~^^^/^\ 

We note that our lower bound limits the applicability of our algorithm design technique when 
we use low-degree polynomials (i.e. linear combinations of low-degree monomials) to uniformly 
approximate all disjunctions over d variables (and in turn, to represent the database). Several 
natural candidates (e.g. the set of small-width conjunctions) can themselves be computed exactly 
by a low-weight polynomial of low-degree, and thus our lower bound applies to these feature spaces 
as well. 

It is an interesting open question to decide whether or not there exists a smaller "feature-space" 
of functions other than low-degree parities or low-width conjunctions such that every disjunction 
over d variables can be uniformly approximated by a low-weight linear combination of features. An 
affirmative answer to this question would immediately yield a more efficient algorithm than ours 
with a similar accuracy guarantee. 

1.2 Techniques 

We now describe the algorithm promised by Theorem 11.11 For notational convenience, we focus 
on monotone k-way disjunction queries. However, our results extend straightforwardly to general 
non- monotone k-way marginal queries via simple transformations on the database and queries. A 
monotone k-way disjunction is specified by a set S C [d] of size k and asks what fraction of records 
in D have at least one of the attributes in S set to 1. 

Following the approach taken in many prior works on privately releasing conjunctions and other 
families of queries, we view the database as a function fu: {—1, 1} — >• [0, 1], in which each input 
vector s G {—1, l}'^ is interpreted as the indicator vector of a set 5 C {1, ... ,d} (with Si = —1 iff 
i G S), and fois) equals the evaluation of the conjunction query specified by S on the database D. 

The starting point for our algorithm is the online private multiplicative weights algorithm 
(PMW) [HRlOj . which has running time 2'^ per query and answers any sequence of arbitrary counting 
queries provided \D\ > v«log|Q|. Gupta, Roth, and Ullman |GRU12| introduced the "IDC 
framework" — capturing PMW and other algorithms — for designing differentially private online 
algorithms and, in particular, showed that such an algorithm can be derived from any online 
learning algorithm that may not necessarily be privacy preserving. 

Informally, an online learning algorithm is one that takes a (possibly adaptively chosen) sequence 
of inputs si, S2) • • • and returns answers ai,a2, ■ ■ ■ to each, representing "guesses" about the values 



fvisi), /d(s2)) • • • for the unknown function //j. After making each guess Oj, the learner is given 
some information about the value of fnisi)- The quantities of interest are the running time required 
by the online learner to produce each guess aj and the number of "mistakes" made by the learner, 
which is roughly the number of periods i in which Oj is far from foisi)- Ultimately, for the 
differentially private algorithm derived in the IDC framework, the notion of far will correspond to 
the error, the per query running time will essentially be equal to the running time of the online 
learning algorithm, and the minimum database size required by the private algorithm will be 
proportional to the number of mistakes. 

The algorithm of Hardt and Rothblum [HRlOj . is based on an online learning algorithm that 
runs in time 2 and makes 0{d) mistakes. A standard approach to obtaining a faster online learning 
algorithm that still makes few mistakes is to use a polynomial approximation to the target function 
/d- Indeed, it is well-known that if fo can be approximated to high accuracy in the Loo norm by 
a d-variate polynomial po '■ { — 1, 1} — )• K of degree t and Li-weight W (defined to be the sum of 
the absolute values of the coefficients), then there is an online learning algorithm that runs in time 
poly(( J) and makes 0{Wd) mistakes. Thus, if i <C d, the running time of such an online learning 
algorithm will be significantly less than 2°^ and the number of mistakes (and thus the minimum 
database size of the resulting private algorithm) will only blow up by a factor of W. 

Our goal is therefore to demonstrate that for any database D, there is a low-degree, low-weight 
polynomial pD such that \pd{s) — fnis)] is small for all vectors s G {—1,1}'^ corresponding to 
monotone k-way disjunction queries. To accomplish this, it is sufficient to construct a low-degree, 
low-weight polynomial that can approximate the d-variate OR function on inputs of Hamming 
weight at most k (those that have —1 in at most k indices). We achieve this, showing that for any 
k there is a suitable polynomial of degree (fi-^(i/v'=) and weight d'^^. For larger values of k — for 
which the previous bound becomes trivial — we show that a suitable polynomial exists with degree 
d/ log'^^ d and weight d°^^^ . 

We also prove a new approximation-theoretic lower bound of independent interest. The lower 
bound suggests that we may have already reached the limit of our approach for designing efficient 
private data release algorithms. Specifically, we show that for any k = o(\ogd), any polynomial p 
of weight poly(d) that satisfies \p{s) — OR(s)| < 1/6 for all inputs s G {—1, l}'^ of Hamming weight 
at most k must have degree d^^^^^'^^' . We prove our lower bound by expressing the problem 
of constructing such a low-weight, low-degree polynomial p as a linear program, and exhibiting 
an explicit solution to the dual of this linear program. Our proof is inspired by recent work 
(cf. Sherstov |She09t[ShelH[Shel2bj and Bun and Thaler |BT13j ) that proves new approximation- 
theoretic lower bounds via the construction of dual solutions to appropriate linear programs. 

Other Results on Privately Releasing Marginals While we have focused on accurately an- 
swering every fc-way marginal query, or more generally every query in a sequence of marginal 
queries, several other works have considered more relaxed notions of accuracy. These works show 
how to efficiently release a summary of the database from which an analyst can efficiently com- 
pute an approximate answer to marginal queries, with the guarantee that the average error of a 
marginal query is at most .01, when the query is chosen from a particular distribution. In partic- 
ular, Feldman and Kothari [FK13J achieve small average error over the uniform distribution with 
running time and database size 0{d'^); Gupta et al. [GHRUlT] achieve small average error over 
any product distribution with running time and minimum database size poly((i); finally Hardt et 
al. [HRS12) show how to achieve small average error over arbitrary distributions with running time 
and minimum database size 2 *^ K All of these results are based on the approach of learning the 
function //). 



Several works have also considered information theoretic bounds on the minimum database size 
required to answer /c-way marginals. Kasiviswanathan et al. [KRSUIO] showed that the minumum 

database size must be at least minj-!^, } to answer all /c-way marginals with error ita. In the 

regime we consider where a = f^(l), their results do not give a non-trivial lower bound. 

Relationship with Hardness Results for Differential Privacy. Ullman |U1112] (building on 
the results of Dwork et al. [ DNR"'"09] ). showed that any 2°*^^ -time differentially private algorithm 
that answers arbitrary counting queries can only give accurate answers if \D\ > |Qp^, assuming 
the existence of exponentially hard one-way functions. Our algorithms have running time 2°^^ and 
are accurate when \D\ <^ IQp^, and thus show a separation between answering marginal queries 
and answering arbitrary counting queries. 

When viewed as an offline algorithm for answering all A;- way marginals, our algorithm will 
return a list of values containing answers to each /c-way marginal query. It would in some cases 
be more attractive if we could return a synthetic database, which is a new database D G ({0, 1}'^)" 
whose rows are "fake" , but such that D approximately preserves many of the statistical properties 
of the database D (e.g. all the marginals). Some of the previous work on counting query release 
has provided synthetic data, starting with Barak et. al. [BCD+OT] and including |BLR08llbNR+09 
IDRVinilHLMT2] 



Unfortunately, Ullman and Vadhan [UV11| (building on DNR"'"09j ) have shown that no differ- 



entially private sanitizer with running time poly((i) can take a database D E ({0, l}'^)" and output 
a private synthetic database D, all of whose 2-way marginals are approximately equal to those of 
D (assuming the existence of one-way functions). They also showed that there is no differentially 
private sanitizer with running time 2 can output a private synthetic database, all of whose 

2-way marginals are approximately equal to those of D. Our algorithms indeed achieve this running 
time and accuracy guarantee when releasing fc-way marginals for constant k, and thus it is inherent 
that our algorithms do not generate synthetic data. 

Relationship w^ith Results in Learning and Approximation Theory. Servedio, Tan, and 
Thaler [STT12J focused on developing low- weight, low-degree polynomial threshold functions (PTFs) 
for decision lists, motivated by applications in computational learning theory. As an intermediate 
step in their PTF constructions, they constructed low-weight, low-degree polynomials that ap- 
proximate the OR function on all Boolean inputs. Our construction of lower-weight, lower-degree 
polynomials that approximate the OR function on low Hamming weight inputs is inspired by and 
builds on Servedio et al.'s construction for the entire Boolean hypercube. 

The proof of our lower bound is inspired by recent work that has established new approximate 
degree lower bounds via the construction of dual solutions to certain linear programs. In particular, 
Sherstov |She09| showed that approximate degree and PTF degree behave roughly multiplicatively 
under function composition, while Bun and Thaler |BT13] gave a refinement of Sherstov's method 
in order to resolve the approximate degree of the two-level AND-OR tree, and also gave an explicit 
dual witness for the approximate degree of any symmetric Boolean function. We extend these lower 
bounds along two directions: (1) we show degree lower bounds that take into account the size of 
the coefficients of the approximating polynomial, and (2) our lower bounds hold even when we only 
require the approximation to be accurate on inputs of low Hamming weight, while prior work only 
considered approximations that are accurate on the entire Boolean hypercube. 

Some prior work has studied the degree of polynomials that point-wise approximate partial 
Boolean functions |Shel2byShel2aj . Here, a function / : y — t- M is said to be partial if its domain 
y is a strict subset of { — 1, 1} , and a polynomial p is said to e-approximate / if 



1. \f{x) — p{x)\ < e for all x £ Y, and 

2. \p{x)\ < 1 + e for all x G {-1, l}'^ \ Y. 

In contrast, our lower bounds apply even in the absence of Condition 2, i.e. when p(x) is allowed 
to take arbitrary values in {—1, l}*^ \ Y. 

Finally, while our motivation is private data release, our approximation theoretic results are 
similar in spirit to recent work of Long and Servedio |LS13| , who are motivated by applications in 
computational learning theory. Long and Servedio consider halfspaces h defined on inputs of small 
Hamming weight, and (using different techniques very different from ours) give upper and lower 
bounds on the weight of these halfspaces when represented as linear threshold functions. 

Organization. In Section [3l we describe our differentially private online algorithm and show 
that it yields the claimed accuracy given the existence of sufficiently low-weight polynomials that 
uniformly approximate the d-variate OR function on inputs of low Hamming weight. The results 
of this section are a combination of known techniques in differential privacy [RR10llHR10l[GRU12] 
and learning theory (see e.g. |KS04j ). Readers familiar with these literatures may prefer to skip 
Section [3] on first reading. In Section [H we give our polynomial approximations to the OR function, 
both on low-weight inputs and on inputs from the entire Boolean cube. Finally, in Section O we 
state and prove our lower bounds for polynomial approximations to the OR function on restricted 
inputs. 

2 Preliminaries 

2.1 Differentially Private Sanitizers 

Let a database D G X"- be a collection of n rows x^^' , . . . ,x^^' from a data universe X. We say 
that two databases D,D' £ X'^ are adjacent if they differ only on a single row, and we denote this 
by D ~ D'. 

Let A : Af" — )• 7^ be an algorithm that takes a database as input and outputs some data 
structure in IZ. We are interested in algorithms that satisfy differential privacy. 

Definition 2.1 (Differential Privacy [DMNS06] ). An algorithm A: X^ — )■ 7^ is {e, 5) -differentially 
private if for every two adjacent databases D ^ D' £ X^ and every subset S '^TZ, 

Pr [A{D) eS]<e' Pr [A{D') e S] + 6. 

Since a sanitizer that always outputs _L satisfies Definition 12.11 we focus on sanitizers that 
are accurate. In particular, we are interested in sanitizers that give accurate answers to counting 
queries. A counting query is defined by a boolean predicate q: X ^ {0, 1}. Abusing notation, we 
define the evaluation of the query g on a database D G X^ to be q{D) = ^ Y17=i q{x )■ Note that 
the value of a counting query is in [0, 1]. We use Q to denote a set of counting queries. 

For the purposes of this work, we assume that the range of A is simply W^' . That is, A outputs 
a list of real numbers representing answers to each of the specified queries. 

Definition 2.2 (Accuracy). The output of A{D), a = {ag)q(zQ, is a-accurate for the query set Q if 

VgeQ, \ag-q{D)\ <a 

A sanitizer is {a, 13) -accurate for the query set Q if for every database D, A{D) outputs o such 
that with probability at least 1 — /3, a is a-accurate for Q, where the probability is taken over the 
coins of A. 

7 



We remark that the definition of both differential privacy and (a, /3)-accuracy extend straight- 
forwardly to the online setting. Here the algorithm receives a sequence of i (possibly adaptively 
chosen) queries from Q and must give an answer to each before seeing the rest of the sequence. 
Here we require that with probability at least 1 — /3, every answer given is within ita of the true 
answer on D. See e.g. |HR10) for a full treatment of the online setting. 

2.2 Query Function Families 

Given a set of queries of interest, Q (e.g. all marginal queries), we think of the database D as 
specifying a function Jy) mapping queries q to their answers q{D), which we call the Q-representation 
of D. We now describe this transformation more formally: 

Definition 2.3 (Q- Function Family). Let Q = {qy} ^y c|-i i|™ ^^ ^ ^®* °^ counting queries on a 
data universe X, where each query is indexed by an m-hit string. We define the index set of Q to 
be the set Yq = {y e {-1, 1}"" \ qy G Q}. 

We define the Q-function family Tq = {fx : { — 1, 1}™ — ^ [0, l]}a,g;f as follows: For every possible 
database row x G X, the function /g^^. : {—1, 1}™ — )• [0, 1] is defined as fQ^xiu) = Qy{x)- Given a 
database D G Af" we define the function fQ^c : {-1, 1}™- — )■ [0, 1] where fQ^niq) = ^ Ya=i fQ,x(^) il)- 
When Q is clear from context we will drop the subscript Q and simply write fx, fo, and J-". 

When Q is the set of all monotone k-way disjunctions on a database D G ({0, 1}°')'^, the queries 
are defined by sets 5 C [d] , \S\ < k. In this case, we represent each query by the d-bit —1/1 
indicator vector ys of the set S, where ys{i) = — 1 if and only if i G 5. Thus, ys has at most k 

entries that are —1. Hence, we can take m = d and Yq = < y G {—1, 1} | ^,=i l{j;,;=-i} ^ k>. 

2.3 Low- Weight Polynomial Approximations 

Given an m-variate real polynomial p: {—1, l}™" — >■ M, 

P{y) = ^ cs-Y{yi, 

SC[m] ieS 

we define the degree, weight w{.) and non-constant weight w*{.) of the polynomial as follows: 

deg(p) := max{|5| : S C ['m\,cs / 0}, 
w{p) ■■= ^ \cs\,and 

SC[m] 

w*{p) ■■= ^ \cs\- 
scH,S7^0 

We use (H) to denote {S C [m] \ \S\ < t} and (^J = \{^^})\ = ^U Q- 

In many cases, the functions /g^^ : {—1,1}"* — )• {0,1} can be approximated well on all the 

indices in Yq by a family of polynomials with low degree and low weight. Formally and more 

generally: 

Definition 2.4 (Restricted Approximation by Polynomials). Given a function /: y — t- M, where 
Y C M™, and a subset y' C y, we denote the restriction of / to Y' by /|y. Given an ?n,-variate 
real polynomial p, we say that p is a ^-approximation to the restriction /|y , if \f{y) — piy)\ < 7 
\/y &Y' . Notice there is no restriction whatsoever placed on p{y) for y G y \ y'. 



Given a family of ?Ti--variate functions T = {fx ■ Y — )• Mj^^^^j^, where Y C M™, a set Y' <^Y and 
a family V of ?7i-variate real polynomials, we say that the family 7^ is a 'j- approximation to J-'\y' if 
for every x G X, there exists px €V that is a 7-approximation to fxlv- 

Let -ffm,fc = {2; G { — 1) 1}"* : SiILi(l — Xi)/2 < k} denote the set of inputs of Hamming weight 
at most k. We view the d variate OR function, OR^ as mapping inputs from {—1, l}*^ to {—1, 1}, 
with the convention that —1 is TRUE and 1 is FALSE. Let Vt^w denote the family of all ?tt,- variate 
real polynomials of degree t and weight W. For the upper bound, we will show that for certain 
small values of t and W, the family Vt,w is a 7-approximation to the family of all disjunctions 
restricted to H^^k- 

Fact 2.5. // Q is the set of all monotone k-way disjunctions on a database D G ({0, 1}'^)", J^ is 
its function family, and Y = Hd^k is its index set, then Vt,w is a 'j- approximation to the restriction 
J^\y if and only if there is a degree t polynomial of weight W that j- approximates ORd\H^k- 

The fact follows easily by observing that for any x G {0, 1} , y G { — 1, 1} , 



fx{y) = \li{y^=-i} = 0^a{yl\---.vT 



For the lower bound, we will show that any collection of polynomials with small weight that is 
a 7-approximation to the family of disjunctions restricted to Hm,k should have large degree. We 
need the following definitions: 

Definition 2.6 (Approximate Degree). Given a function f:Y — )■ R, where Y C R"^, the 7- 
approximate degree of / is 

deg^(/) := min{(i : 3 real polynomial p that is a 7-approximation to /, deg(p) = d}. 

Analogously, the (7, M^)-approximate degree of / is 

deg(^ iy)(/) := min{(i : 3 real polynomial p that is a 7-approximation to /, deg(j)) = d, w{p) < W}. 

It is clear that deg^(/) = deg(^^^)(/). 

We let w*{f,t) denote the degree-t non-constant margin weight of /, defined to be: 

w*{f,t) := mm{w*{p) : deg(p) < t,f(y)p(y) > 1 V y G F}. 

The above definitions extend naturally to the restricted function /|y. 

Our definition of non-constant margin weight is closely related to the well-studied notion of the 
degree-t polynomial threshold function (PTF) weight of / (see e.g. [Shell] ), which is defined as 
iniup w{p), where the minimum is taken over all degree-t polynomials p with integer coefficients, such 
that f{x) = sign(p(x)) for all x G { — 1, 1} . Often, when studying PTF weight, the requirement 
that p have integer coefficients is used only to ensure that p has non-trivial margin, i.e. that 
\p{x)\ > 1 for all X G {—1, 1}*^; this is precisely the requirement captured in our definition of non- 
constant margin weight. We choose to work with margin weight because it is a cleaner quantity to 
analyze using linear programming duality; PTF weight can also be studied using LP duality, but 
the integrality constraints on the coefficients of p introduces an integrality gap that causes some 
loss in the analysis (see e.g. Sherstov [Shell! Theorem 3.4] and Klauck [KlalU Section 4.3]). 



3 From Low- Weight Approximations to Private Data Release 

In this section we show that low-weight polynomial approximations imply data release algorithms 
that provide approximate answers even on small databases. Informally, if the family of low-weight, 
low-degree m-variate polynomials Vt,w (l/400)-approximates J-'q, then there is a differentially 
private online algorithm algorithm with running time poly((^j), \Q\) that releases answers to every 
query sequence of i queries in Q within error ±.01 as long as n > Wy/mlogi/e. 

The results in this section can be assembled from known techniques in the design and analysis of 
differentially private algorithms and online learning algorithms. We include them for completeness, 
as to our knowledge they do not explicitly appear in the privacy literature. 

We construct and analyze the algorithm in two steps. First, we use standard arguments to 
show that the non-private multiplicative weights algorithm can be used to construct a suitable 
online learning algorithm for /q^d whenever /q^d can be approximated by a low-weight, low-degree 
polynomial. Here, a suitable online learning algorithm is one that fits into the IDC framework of 
Gupta et al. |GRU12J . We then apply the generic conversion from IDCs to differentially private 
online algorithms [RR101IHR101IGRU12] to obtain our algorithm. 

3.1 IDCs 

We start by providing the relevant background on the iterative database construction framework. 
To cut down on notational clutter, we give the definition specialized to the case where the database 
is to be represented by a polynomial of low-degree and low-weight. Roughly, each of these mecha- 
nisms works by maintaining a sequence of polynomials p^^' ,p^'^' , • • • G Vt^w that give increasingly 
good approximations to the Q- function family f^. Moreover, the mechanism produces the next 
polynomial in the sequence by considering only one query y^^' that "distinguishes" the real database 
in the sense that Ip^*^?/ ) ~ fD{y )\ is large. 

Syntactically, we will consider functions of the form U : Vt^w x Q x M — t- Vt,w- The inputs to 
U are a polynomial p'*^ G 'Pt,Wj which is the current polynomial approximation; a query y € Yq, 
which represents the distinguishing query; and also a real number that estimates fniy)- Formally, 
we define a database update sequence, to capture the sequence of inputs to U used to generate the 
database sequence p^^> , p^^> , . . . . 

Definition 3.1 (Database Update Sequence). Let D G ({0, 1} )" be any database and let 
{(j)*-*\2/(*\a(*))}^^^ „ S {Vt,w X Q X M)*-^ be a sequence of tuples. We say the sequence is an 
(U,D,Q,a,C)-database update sequence if it satisfies the following properties: 

1. p«=D(0,-,-), 

2. foreveryt = l,2,...,C7, \fD{y^''>)-p^'Hy^'^)\ > «, 

3. for every t = l,2,...,C, \fD{y^'^) - a^ | < a/2, 

4. and for every t = 1, 2, . . . , C7 - 1, p(*+i) = U(pW, yW, a^). 

We note that for all of the iterative database constructions we consider, the approximate answer 
a^" is used only to determine the sign of foiy ) —p{y), which is the motivation for requiring 
that a*-*-* have error smaller than a. The main measures of efficiency we're interested in from an 
iterative database construction are the maximum number of updates we need to perform before 
the database p'*^ approximates D well with respect to the queries in Q and the time required to 
compute U. To this end we define an iterative database construction as follows: 
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Definition 3.2 (Iterative Database Construction). Let U : T't,w x Q x M — )■ T't,w be an update 
rule and let S : M — >■ M be a function. We say U is a B {a) -iterative database construction for 
query class Q if for every database D G ({0, 1}'^)", every (U, D, Q, a, C)-database update sequence 
satisfies C < B{a). 

Note that, by definition, if U is a i?(a)-iterative database construction, tiien given any maximal 
(U, D, Q, a, C)-database update sequence, the final database p^*^^ must satisfy 



Vy G Yq, 



fD{y)-p^''\y) 



<a 



or else there would exist another query satisfying property 2 of Definition 13 .11 and thus there would 
exist a (U, D, Q,a,C + l)-database update sequence, contradicting maximality. 

Theorem 3.3 (Variant of |GRU12J ). For any a > 0, and any family of linear queries Q, if there is 

a B (a) -iterative database construction, U, for Q then there is an (e, 5) -differentially private online 

algorithm that is {Aa, (3) -accurate for any sequence of H. (possibly adaptively chosen) queries from 

Q so long as 

^ W00y^Bia)log{e/(3)\og{A/6) 

n > . 

ae 

Moreover, ifJJ runs in time Tu, then the private algorithm has running time poly(Tu) per query. 

The IDC we will use is specified in Algorithm [TJ We note that the algorithm will represent a 
polynomial as a vector p of length 2(^^) + 1 with only non- negative entries. For each coefficient 

S G (<j), the vector will have two components pg,p^g. Intuitively these two entries represent the 
positive part and negative part of the coefficient cs of p. There will also be an additional entry pq 
that is used to ensure that the Li-norm of the vector is exactly 1. Given a polynomial p G Vt^w 
with coefficients (cs), we can construct this vector by setting 

_ max{0, C5} _ max{0, — C5} 

P' = w P-' = W 

and choosing pg so that ||p||i = 1. Observe that pg can always be set appropriately since the weight 
of p is at most W. 

We observe two things about p: (1) Given a query y G { — 1, l}*", we can construct a vector y 
of length 2(™) + 1 in which y^ = 0, yg = Hiesyj ^^"^ V^s — ~Y[i^syi- This vector will satisfy 
W{p^y) = p{y)- (2) P now represents a probability distribution on the 2(™J "coefficients." 

We summarize the properties of the multiplicative weights algorithm in the following theorem: 

Theorem 3.4. For any q > 0, and any family of linear queries Q ifVt^w (ct/ 4:) -approximates the 
restriction J^\y then Algorithm 1 is a B (a) -iterative database construction, U, for 

16VF2log(2(:!;)+l 



a^ 



B{a) 

Moreover, U runs in time poly((™^)). 

Proof. Let D G ({0,1}'^)" be any database and consider a (U*^^,D, Q,a, -B)-database update 
sequence, { (p*-*\ y *•*-', a*^*'')}^^-^ ^. It will be sufficient if we can show that B < 16VF^log(2(^J + 
l)/a2. Specifically, that after B = leVF^ log(2(^J + l)/a'^ invocations of JJ^^ , the polynomial 
p^^' is such that 

VyGYg, \W{p'^''\y)-fD{y)\<a. 
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Algorithm 1 The Muhiplicative Weights Algorithm for Low- Weight Polynomials. 

Let rj ^ a/ AW. 

If: p^*'' = then: output p^^> = /^\ (1, . . . , 1), a representation of the constant polynomial. 

Else if: S(*) < VF(pW,yW) 

Let r(*) = y(*) 
Else: 

Let r(*) = -y(*) 
Update: For all S G i}"^}) let 

_(t+l) / _(t)x _(t) -(t+1) / -(t) N -(*) 

Ps ^ exp(-T/ry) -py, pXs ^ exp(-r/rXs) • p^s 



Output p(*+i). 



T){*+1) 



||p{t+l)||^ 



That is p^ ' represents a polynomial that approximates /d- 

First, we note that there always exists a polynomial pa £ Vt^w such that 

VyelQ, |PD(y)-/D(y)|<|. (i) 

The assumption of our theorem is that for every x^"^' € D, there exists p^(i) G Vt^w such that 

Vy G yg, \p^(i)iy) - f^(^)iy)\ < j- 

Thus, since fn = ^ ZlLi /a;(^)' ^^^ polynomial pn = ^Yl'i=iPx(i) ^^^^ satisfy ^. Note that 
Pd £ 'Pt,w, thus if we represent pD as a vector, 

Vyeyg, |H^(l5D,y)-/D(y)|<^. 

Given the existence of p^ , we will define a potential function capturing how far p^^' is from p^) . 
Specifically, we define 



M>,:=KL{p^\\p('^)= Yl PDjlogiq 



PD£ 

it) 



/e{o,([^l),^([<i)} ^^' ' 

to be the KL divergence between pj-, and the current approximation p'*-* . Note that the sum iterates 
over all 2(^J + 1 indices in p. We have the following fact about KL divergence. 

Fact 3.5. For all t: ^t > 0, and ^o < log (2(™) + l) . 

We will argue that in each step the potential drops by at least a^/16VF^. Because the potential 
begins at log (2(™J + l), and must always be non- negative, we know that there can be at most 
B{a) < 16Ty^ log (2(™J + l)/a^ steps before the algorithm outputs a (vector representation of) a 
polynomial that approximates fo on Yq. 
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Lemma 3.6 (Modification of |HR10| ). 



^t-'i't+i>v{¥'\ 



4t)\ 



Proof. 



*t - ^t+i 



Yl Pd,i log 

( 



{PD,r^'^: 



. Pf 



\ 



log X] 

Ve{°'(<"')-(<*)} 



(*)^J*) 



exp(— ?7r} )p 



\ 



> 



-r/(p£,,rW) -log 



V^e{°'(<')-(<*)} 



> T/ 



.W\ 



(p 



D) 



4*)^ 



D 



The rest of the proof now follows easily. By the conditions of an iterative database construction 
algorithm, la*^*) — fD{v^^^)\ < a/2. Hence, for each t such that |VF(p(*\y*^*)) — /dIj/*-*^)! > a, we also 
have that H^(pW,y(*)) > /^(y^*^) if and only if W^(pW,yW) > a^. 

In particular, if r^*) = y(*), then W f^^^\%f^^) — W{pj;),y^^^) > a/2. Similarly, if r^*) = — y*^*^ 
then W{pj^,y^^'^) — W{p^^\y^*^) > a. Here we have utilized the fact that Ipoiy) — fDiy)\ < a/4. 
Therefore, by Lemma 13.61 and the fact that ij = a /AW: 



a 



*/ — ^f+1 > 

~ AW 



.W\ 



(i5D,rW; 



a 



> 



a / a \ 



a 



a 



IGW^ - 4M^ \2WJ 16W^ 16M/2 



D 



4 Upper Bounds 

Fact 12.51 shows that in order to develop a differentially private mechanism that can release all k-way 
marginals of a database, it is sufficient to construct low- weight polynomials that approximate OR^, 
the d-variate OR function, on all Boolean inputs of Hamming weight at most k. This is the purpose 
to which we now turn. 

The OHd function is easily seen to have an exact polynomial representation of constant weight 
and degree d (e.g. see Fact 14.3] below): however, an approximation with smaller degree may be 
achieved at the expense of larger weight. The best known weight-degree tradeoff, implicit in the 
work of Servedio et al. |STT12] . can be stated as follows: there exists a polynomial p of degree t 
and weight (c?log {l/j)/t)^'^^^'^^^''^' '''' that 7-approximates the ORj; function on all Boolean inputs, 
for every t larger than vdlog (I/7). Setting the degree t to be 0{d/ log'^^ d) yields a polynomial of 
weight at most d^'^^ that approximates the OR^^ function over the entire Boolean hypercube to any 
desired constant accuracy. On the other hand. Lemma 8 of [STT12J can be shown to imply that 
any polynomial of weight W that 1/3-approximates the OR^ function requires degree Q{d/ log W), 
essentially matching the 0{d/ log' d) upper bound of Servedio et al. when W = d^'^^' . 
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However, in order to privately release k-way marginals, we have shown that it suffices to con- 
struct polynomials that are accurate only on inputs of low Hamming weight. In this section, we give 
a construction that achieves significantly improved weight degree trade-offs in this setting. In the 
next section, we demonstrate the tightness of our construction by proving matching lower bounds. 

We construct our approximations by decomposing the d-variate OR function into an OR of 
OR's, which is the same approach taken by Servedio et al. |STT12| . Here, the outer OR has fan-in 
m and the inner OR has fan-in d/m, where the subsequent analysis will determine the appropriate 
choice of m. In order to obtain an approximation that is accurate on all Boolean inputs, Servedio 
et al. approximate the outer OR using a transformation of the Chebyshev polynomials of degree 
-y/m, and compute each of the inner OR's exactly. 

For k <C log^ d, we are able to substantially reduce the degree of the approximating polynomial, 
relative to the construction of Servedio et al., by leveraging the fact that we are interested in 
approximations that are accurate only on inputs of Hamming weight at most k. Specifically, we 
are able to approximate the outer OR function using a polynomial of degree only \/k rather than 
-y/m, and argue that the weight of the resulting polynomial is still bounded by a polynomial in d. 

We now proceed to prove the main lemmas. For the sake of intuition, we begin with weight- 
degree tradeoffs in the simpler setting in which we are concerned with approximating the OR^ 
function over the entire Boolean hypercube. The following lemma, proved below for completeness, 
is implicit in the work of |STT12] . 

Lemma 4.1. For every 7 > and m £ [d], there is a polynomial of degree t = 0{d\og{l/J)/^/m) 
and weight W = m,'-^(v»^iog(i/7) ^/jq^ j- approximates the ORd function. 

Our main contribution in this section is the following lemma that gives an improved polynomial 
approximation to the OR^; function restricted to inputs of low Hamming weight. 

Lemma 4.2. For every j > 0, k < d and m £ [d] \ [k], there is a polynomial of degree t = 
0{dvklog{l/^)/m) and weight W = 771 '^(^^^"^(i/t)) that 'y- approximates the ORd function re- 
stricted to inputs of Hamming weight at most k. 

For any constant accuracy, one may take m = d^^'^ ' in the lemma (here the choice of constant 
depends on the constants in Fact 14.41 and the desired accuracy) and obtain a polynomial of degree 
^i-n{i/Vk) and weight d'Oi. 

Our constructions use the following basic facts. 

Fact 4.3. The real polynomial pd ■ {1, — l}*^ -^ M. 

\sc[d] ies 
computes ORd{x) and has weight w{pd) < 3. 

Fact 4.4. [see e.g. jTUVl^ l For every k £ N and 7 > 0, there exists a univariate real polynomial 
p = J^iLo CiX* of degree t^ such that 

1. tk = 0{Vk\og{lh)), 

2. for every i G [tk], \ci\ < 20(v^i°g(V7))^ 

3. p(0) = 0, and 
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4- for every x G [2k], \p{x) — 1| < 7/2. 



Proof of Lemma \4^ We can compute ORrf(y) as a disjunction of disjunctions by partitioning the 
inputs yi, . . . ,yd into blocks of size d/m and computing: 

OR,n{ORd/m{yi, ■ ■ -^yd/m), ■ ■ ■ ^O'R-d/miyd-d/m+l, ■ ■ ■,yd))- 

In order to approximately compute ORrf(y), we compute the inner disjunctions exactly using the 
polynomial p^/^ given in Fact 14.31 and approximate the outer disjunction using the polynomial from 
FactSa Let 

z{y) = Pd/m{yi, ■■■, yd/m) h ^ Pd/m{yd-d/m+i, ■ ■■,yd)- 

Setting k = m in Fact 14. 4^ let qm be the resulting polynomial of degree 0{^/m log{l/'y)) and weight 
(9(^v^iog(i/7))_ Our final polynomial is 

1 -2qm{rn- Z{y)). 

Note that m—Z{y) takes values in {0, . . . , m} and is exactly when all inputs yi, . . . ,yd are FALSE. 
It follows that our final polynomial indeed approximates OR^ to additive error 7 on all Boolean 
inputs. 

We bound the degree and weight of this polynomial in y. By Fact 14.31 the inner disjunc- 
tions are computed exactly using degree d/m and weight at most 3. Hence, the total degree is 
0(-y/m log(l/7) • d/m). To bound the weight, we observe that the outer polynomial (?m(') has at 
most T = 171^^^^^°^^^''^'' terms where each one has degree at most -Douter = 0{^/m log{l/j)) and 
coefficients of absolute value at most Couter = 2'^^^^^"^^^''^'' . Expanding the polynomials for Z{y), 
the weight of each term incurs a multiplicative factor of Cinncr ^ 3^°"*=^ _ 30(vmiogi/7) g^ ^.j^g total 

weight is at most Cjnner • Couter " T = m'^iV^^oglf-y) _ 

a 

Proof of Lemma \4-S\ Again we partition the inputs yi, . . . ,yd into blocks of size d/m and view the 
disjunction as: 

ORmiORd/m{yi, ■ ■ ■ ■.yd/m)-,--- ■,Q'^d/m{yd~d/m+l-, ■ ■ ■ ,yd))- 

Once again, we compute the inner disjunctions exactly using the polynomial from Fact 14.31 Let 

^■(y) = Pd/m{yi, ■ ■ ■ ,yd/m) h ^Pd/m{yd~d/m+i,---,yd)- 

If the input y has Hamming weight at most k, then Z(y) also takes values in {m, . . . ,m — 2k}. 
Thus, we may approximate the outer disjunction using a polynomial of degree 0{\/Tclog{l/'j)) from 
Fact 14.41 Our final polynomial is: 

1 - 2qk{m- Z). 

The bound on degree and weight may be obtained as in the previous lemma. D 

4.1 Proof of Theorem 11.11 

We now present the proof of Theorem II. 1[ 

Proof of Theorem ] 1.1\ Taking m = 0((log(i/loglogd)^) in Lemma |4.H taking m = d^^^'^^' in 
Lemma 14.21 and combining with Fact 12.51 it follows that for some constant C > 0, the family 
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of d-variate disjunctions restricted to Hd^k is 0.01-approximated by the family of d-variate real 
polynomials of degree t and weight W where 

t = min|/-^,^-45^} ^nd W = cf-'\ 

Consequently, by Theorem I3.4[ we have an algorithm that is a i?(l/400)-iterative database 
construction where 



5(1/400) = 0{d''-'^^t log d) = 0[ dt^-'^' log d • min <^ d 



CVk 



d 



0.995 



log^-^^^° d 



and the algorithm runs in time T = poly((^J) 

Thus, by Theorem l3.3l we have an (e, (5)-differentially private online algorithm that is (0.01, 0.01)- 
accurate for any sequence Q of (possibly adaptively chosen) fc-way marginal queries provided the 
size of the database 

n = f^(^01og(lOO|Q|)logQ))dOOiyi^.min|/-'-^,^-4^}). 

Further, the algorithm runs in time poly(T) =poly((^J) = min < exp ( d^~^' ^ 1 ,exp((i/log ' d)\- 

D 

Remark 1 in the Introduction follows from using a slightly different choice of m in Lemma 14. H 
namely m = 0(log d/ log log d) . 

To obtain the summary of the database promised in Remark 2, we request an answer to each 
of the fc-way marginal queries .6(1/400) times. Doing so, will ensure that we obtain a maximal 
database update sequence, and it was argued in Section [2. II that the polynomial resulting from any 
maximal database update sequence accurately answers every fc-way marginal query. Finally, we 
obtain a compact summary by randomly choosing 0{kd'^^) samples from the normalized coefficient 
vector of this polynomial to obtain a new sparse polynomial that accurately answers every k-way 
marginal query (see e.g. |BS92| ). Our compact summary is this final sparse polynomial. 

5 Lower Bounds 

In this section, we address the general problem of approximating a block-composed function G = 
F(. ..,/(.),...), where F: {-1,1}'' -^ {-1,1}, f ■ Y ^ {-1,1}, Y C R^/k ^^^^ j^^p^^g restricted 
to a set y C Y^ using low-weight polynomials. We give a lower bound on the minimum degree of 
such polynomials. In our main application, G will equal OR^, and y will be the set of all length d 
Boolean vectors of Hamming weight at most k. 

Our proof technique is inspired by the composition theorem lower bounds shown in |She09[ 
Theorem 3.1], where it is shown that the 7-approximate degree of the composed function G is 
at least the product of the 7-approximate degree of the outer function and the PTF degree of 
the inner function. Our main contribution is a generalization of such a composition theorem 
along two directions: (1) we show degree lower bounds that take into account the Ll-norm of the 
coefficient vector of the approximating polynomial, and (2) our lower bounds hold even when we 
only require the approximation to be accurate on inputs of low Hamming weight, while prior work 
only considered approximations that are accurate on the entire Boolean hypercube. 

Our main theorem is stated below. In parsing the statement of the theorem, it may be helpful 
to think of G = OR^, y = Hd^k-, the set of all length d Boolean vectors of Hamming weight at 
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most k, f = ORrf/;,, F = OR^, Y = {—1, 1}'^'^, and H = H^^/^i, the set of all Boolean vectors 
of Hamming weight at most 1. This will be the setting of interest in our main application of the 
theorem. 



Theorem 5.1. Let Y C M'^/'' be a finite set and 7 > 0. Given f lY ^ {-1, 1} and F : {-1, 1}'' 

927 



{ — 1,1} such that deg2ry{F) = D, let G : Y^ — )• {—1,1} denote the composed function defined by 




GiYi, . . . , Yfc) = F{f{Yi), . . . , /(y^)). Let y C Y^ . Suppose there exists H (lY such that for every 
(Yi, . . . , Yfc) G Y^ \ y there exists i G [k] such that Yi € Y \ H. Then, for every t € Z+, 

deg(^^^W)iG\y) > Ud for every W < -/2-''w*if\H,t)f ■ 

We derive the following corollary from Theorem 15. 1[ Theorem 11.31 follows immediately from 
Corollarv 15.21 by considering any k = o(logd). 

Corollary 5.2. Let k £ [d]. Then, there exists a universal constant C > such that 

de9(i/6,W){ORd\Ha,k) = 

Intuition underlying our proof technique. Recall that our upper bound in Section H] worked 
as follows. We viewed OR^^ as an "OR of ORs", and we approximated the outer OR with a 
polynomial p of degree dego^tg^ chosen to be as small as possible, and composed p with a low- 
weight but high-degree polynomial computing each inner OR. We needed to make sure the weight 
l^inner of the inner polynomials was very low, because the composition step potentially blows the 
weight up to roughly W^inner"*'''^ • ^^ ^ result, the inner polynomials had to have very high degree, 
to keep their weight low. 

Intuitively, we construct a dual solution to a certain linear program that captures the intuition 
that any low- weight, low-degree polynomial approximation to OHd must look something like our 
primal solution, composing a low-degree approximation to an "outer" OR with low-weight approx- 
imations to inner ORs. Moreover, our dual solution formalizes the intuition that the composition 
step must result in a massive blowup in weight, from Winner to roughly W^innCT"*""^ ■ 

In more detail, our dual construction works by writing OR^ as an OR of ORs, where the outer 
OR is over k variables, and each inner ORs is over d/k variables. We obtain our dual solution 
by carefully combining a dual witness T to the high approximate degree of the outer OR, with a 
dual witness ip to the fact that any low-degree polynomial with margin at least 1 for each inner 
OR, must have "large" weight, even if the polynomial must satisfy the margin constraint only on 
inputs of Hamming weight or 1. This latter condition, that ip must witness high non-constant 
margin weight even if restricted to inputs of Hamming weight or 1, is essential to ensuring that 
our combined dual witness does not place any "mass" on irrelevant inputs, i.e. those of Hamming 
weight larger than k. 

5.1 Duality Theorems 

In the rest of the section, we let xs{^) = Yli^s^i ^°^ ^^y given set S CI [d]. The question of 
existence of a weight W polynomial with small degree that 7-approximates a given function can be 
expressed as a feasibility problem for a linear program. Now, in order to show the non-existence 
of such a polynomial, it is sufficient to show infeasibility of the linear program. By duality, this is 
equivalent to demonstrating existence of a solution to the corresponding dual program. We begin 
by summarizing the duality theorems that will be useful in exhibiting this witness. 
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Theorem 5.3 (Duality Theorem for (7, Ty)-approxhxLate degree). Fix 7 > and let f : Y ^ 
{—1, 1} be given for some finite set Y C M . Then, deg/^ w)if) ^ ^ + 1 if o-nd only if there exists a 
function ^ : y — t- M such that 

1- EyeY\ny)\ = h 

2- EyeY ^(y)/(y) -W-\ Zyev '^{y)xs{y)\ > l for every 5 C [d], \S\ < t. 
Proof By definition, deg(^ jy)(/) < t if and only if 3(A5)5.c[d],|s|<t : 

Y, |As| < W, and 

SC[d],\S\<t 



fiy) 



XI ^sxsiy) 

SC[d],\S\<t 

By Farkas' lemma, degi^^r-, (/) < i if and only if $ "^ :Y 



< 7 V y G y. 



such that 



i^5;(/(y)M/(y)-7|^(y)|) 



> 



yev 



J^X5(y)^(y) 



V5C [d],\S\ <t. 



D 



The dual witness that we construct to prove Theorem 15.11 is obtained by combining a dual 
witness for the large non-constant margin weight of the inner function with a dual witness for the 
large approximate degree for the outer function. The duality conditions for these are given below. 
The proof of the duality condition fo r the case of 7-approximate degree is well-known, and we omit 
the proof for brevity (see e.g. |Shell[[S08llBTT3] l. 



Theorem 5.4 (Duality Theorem for 7-approximate degree). Fix 7 > and let f : Y ^ {~1) 1} 

be given, where Y C M."^ is a finite set. Then, deg^{f) > t + 1 if and only if there exists a function 
r : y -;> R such that 

1- E,6y|r(y)| = i, 

^- X^Hgy ^iy)piy) — fo^ every polynomial p of degree at most t, and 

3- E,6yr(y)/(y)>7- 

Theorem 5.5 (Duality Theorem for non-constant margin weight). Let Y CM.'^ be a finite set, let 
/ : y — ;■ {1, —1} be a given function and w > 0. The non-constant margin weight w*{f,t) > w if 
and only if there exists a distribution ^u : y — t- [0, 1] such that 

1- E,eyMy)/(y) = o 



2. 



Y.v(^Yl^(y)f(y)xs{y) < i: for every S C [d], \S\ < t. 



Proof Let S = {S Q [d] : \S\ < t}, S = S\{(/}}. By definition, w*{f, t) is expressed by the following 
linear program: 

min X IA5I 
ses 

/(y)i;A5X5(y)>iVyGy 
s&s 
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The above linear program can be restated as follows: 



min "^as 
ses 
as + Xs>Oy S £S, 

as-Xs>Oy S eS, 

f{y) ^ XsXsiy) > 1 V y G y, and 
s&s 

as>Oy S €S. 



The dual program is expressed below: 



E^(y) 



max 

y 
ui{S)+U2{S) < 1 V5g5, 

Y. Ky)fiy)xsiy) + miS) - U2{S) = o v 5 g 5, 

yeY 

Ky) >0 Vy ey, ui{S),U2{S) >0 V5g5. 

By standard manipulations, the above dual program is equivalent to 

max Y^J,{y) 
y 

\Y.Ky)xs{y)fiy)\<iyseS 

ydY 

y&Y 

Ky) > V y G y 

Finally, given a distribution //' satisfying the hypothesis of the theorem, one can obtain a 
dual solution // to show that w*{f,t) > w hy taking w~^ = max^g^ | Y^y^y f^'(.y)xs{y)f{y)\ and 
setting yu(y) = wii'{y) \/ y £ Y. In the other direction, if w*{f,t) > w, then we have a dual 
solution fj, satisfying the above dual program such that J2yeY f^iy) ~ ^*(/)0- By setting /i'(y) = 
fj,{y)/w*{f,t) V y G y, we obtain the desired distribution. D 

5.2 Proof of Theorem [5Jl 

Our approach to exhibiting a dual witness as per Theorem 15.31 is to build a dual witness by 
appropriately combining the dual witnesses for the "hardness" of the inner and outer functions. 
Our method of combining the dual witnesses is inspired by the technique of |She09t Theorem 3.7]. 

Proof of Theorem \5.1\ Let w = w*{f\H,t). We will exhibit a dual witness function ^ : 3^ — ?■ M 
corresponding to Theorem 15.31 for the specified choice of degree and weight. For y G Y , let 
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^i = (y(i-i)(d/fe)+ii • • • ) Vid/k)- By Theorem 15.51 we know that there exists a distribution fj, : H 
such that 

y&H 



Y^ Ky)fiy)xsiy) 

ydH 



w 



\S\<t 



(2) 
(3) 



We set fj,{y) = ior y €Y\H. 

Since deg2^(-F) = D, hy Theorem 15.41 we know that there exists a function F : {—1, 1} 
such that 



(4) 



xG{-1,1}'' 

y T[x)p{x) = for every polynomial p of degree at most D, and (5) 

xe{-i,i}'' 

Y, r(x)F(x) > 27. (6) 

a;e{-l,l}* 

Consider the function ^ : y'^ ^ M defined as ^(y) = 2'=r(/(yi), . . . , /(Yfc)) HLi ^(^0- By the 
hypothesis of the theorem, we know that if (Yi, . . . , Y^) G Y \y, then there exists i G [k] such that 
Yi £Y\H and hence ^(Yi) = and therefore ^(Yi, . . . , Ifc) = 0. 

1. 



Y l*(y)l = E 2'|r(/(Yi), . . . , /(n))| 11^(^0 
y^y y<^y 



i=l 



= 2%^$(|r(/(yi),...,/(n))|) 

where y ~ $ denotes y chosen from the product distribution ^ : Y — )■ [0, 1] defined by 
$(y) = niGffcl f^O^)- Since X^yey l^iy)fiy) — 0' i* follows that if Yj is chosen with probability 
^(Yi), then f{Yi) is uniformly distributed in { — 1,1}. Consequently, 

E l*(y)l = 2X~^{-i,i}^(|r(^i, ...,zk)\) = i. 
y&y 

The last equality is by using ([4]). 

2. By the same reasoning as above, it follows from ([6]) that 

J2^{y)G{y)= E r(z)F(z)>27. 
y&y 2e{-i,i}'= 

3. Fix a subset S C [d] of size at most tD/2. Let Si = S Ci {{i — l){d/k) + 1, . . . , id/k} for each 
i G [A;]. Consequently, xs{y) = 11^=1 Xs,(>"i)- 

Now using the Fourier coefficients F(T) of the function F, we can express 



F(zi,...,z,)= Er(r)n^^= E rmn 



Zi 



TC[k] 



iGT TC[k], 

\T\>D 



ieT 
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since f (T) = if |r| < D by §^. Hence, 



^{y) = 2' ^ tiT)llf{YMY^)■ n M^*) 

«e[fc]\T 



TC[A;], ieT 

|T|>D 



Therefore, T.y^y "^{yjxsiy) 

j/ey ie[k] 

( 



\ 



2'E E rmn/(^^H^')- n /^(^^) 

I rc[fc], 

\|T|>D 



\{xsAy^) 

ie[fc] 



rc[fc], yey \ieT ie[A:]\r ie[fe] 

|T|>D 

Tc[fc], yi,...,yfceH Ver Je[fc]\T 

|T|>D 

Rearranging, we have Eyey ^{y)xs{y) = 

2^ Y, f(r)n f E /m)^(>^.)X5,m) I n ( E /^(^^)X^.(^^) I • (7) 

Tc[fc], ieT \y,e// / ie[fc]\r \y.eH 

|T|>D 

Now, we will bound each product term in the outer sum by w^ ''^. We first observe that for 
every i G [k], 



E ^^^^)xs,{x) < E A*(^) = 1- 



If \Si\ <t, by © 



If 1 5*1 1 > t, then 



^/(x)^(x)xs.(a 

x&H 



1 
< — . 

W 



'Yf{x)fi{x)xs^{x) 



x&H 



<Y.^'^^) = ^■ 



x&H 



Since X]j=i I'S'il — tD/2, it follows that \Si\ < t for more than k — D/2 indices i G [A;]. Thus, 
for each T C [A;] such that |r| > D, there are at least D/2 indices i £ T such that \Si\ < t. 
Hence, 



Y'^{y)xsiy) 

yey 



ife„.,-^ 



<2^w-^ Y, f(r) 

TC[fe], 
\T\>D 

Here, the last inequality is because |r(T')| < 2^^ from (j4j). 



< 2''w-^ 
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From 1, 2 and 3, we have 



y^ ^{y)G{y) -W max 

y&y 



sc[d],\s\<^ 



J^*(?/)xs(y) 

yey 



>7 



We now derive Corollary 15.21 We need the following theorems on the approximate degree and 
the non-constant margin weight of the OR^ function. 

Theorem 5.6 (Approximate degree of OR^). \Pat9^ degi/s{ORd) = @{Vd). 

Lemma 5.7 (Non-constant margin weight of OR^). w*{ORd\Hai^t) ^ d/t. 

Proof. The function 

1/2 ifx = (l,...,l), 



^^"^^ \ l/2d ifxGFrf,i\{(l,...,l)}. 
acts as the dual witness in Theorem 15.51 D 



Proof of Corollarv \5.S\ . We use Theorem 15.11 in the following setting. Let Y = {— 1,1} ' , the 
inner function f : Y ^ {— li 1} be OK^/k a^id the outer function F : {—1, 1}'^ — )■ {—1, 1} be ORfc, 
y = Hd^k and H = Hd/k,i- By a simple counting argument, if (Yi, . . . , Yfc) G {—1, l}'^ \ -f^d,fc, then 
there exists i G [k] such that Yi G {—1,1}^''^ \ Hd/k,i- Further, by Theorem 15.51 we know that 
degx/3(-F) = 6(\/A;) and by Claim 1^771 we know that w*{f\H,t) > d/kt. Therefore, by Theorem 
15.11 we have that, for every t G Z+, 

1 / 7 \ C\k 

degi/6,H/(0^d|H,,J = ^ (i^) for every W < -l''^ i^-j . 

We obtain the conclusion by taking t = l{d/k){6W2'')~^/'^^ \ . D 

Comparison to [STT12] . As described in the beginning of Section |4l Lemma 8 of the work 
of Servedio et al. [STT12J can be shown to imply that any polynomial p of weight W that 1/3- 
approximates the OR^ function at all Boolean inputs requires degree r2((i/log VF)o The proof 
in [STT12| relies on a Markov-type inequality that bounds the derivative of a univariate polynomial 
in terms of its degree and the size of its coefficients. The proof of this Markov- type inequality is 
non-constructive and relies on complex analysis. 

Here, we observe that our dual witness construction used to prove Corollary 15.21 also yields a 
general lower bound on the tradeoffs achievable between the weight and degree of the approximating 
polynomial p, even when we require p to be accurate only on inputs of Hamming weight at most 
0{logW) (see Theorem 15. Sp . The methods of Servedio et al. do not appear to yield any non- 
trivial lower bound on the degree in this setting. We also believe our proof technique is of interest 
in comparison to the methods of Servedio et al. as it is constructive (exhibiting an explicit dual 
witness for the lower bound) and avoids the use of complex analysis. 



^More precisely, |STT12I Lemma 8] as stated shows that if the coefScients of a univariate polynomial P each 
have absolute value at most W, and 1/2 < max^jgjo,!] |f (a::)! < R, then max^-gfoij |P'(j:)| = 0(deg(P) • R ■ (log W + 
logdeg(P))), where P'{x) denotes the derivative of P at x, and deg(P) denotes the degree of P. By inspection of the 
proof, it is easily seen that if the Li-norm of the coefficients of P is bounded by W, then the following slightly stronger 
conclusion holds: maXj.g[o,i] |P'(a:)| = 0(deg(P) ■ R ■ log W^). When combined with the symmetrization argument 
of |STT1 2'. this stronger conclusion implies: any polynomial p of weight W that 1/3-approximates the ORd function 
at all Boolean inputs requires degree n{d/logW). 
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Theorem 5.8. Any polynomial p of weight W that 1/6- approximates the ORd function at all 
Boolean inputs requires degree d/l^"^ ^^ ' . 

Proof. As p is accurate on the entire Boolen hypercube, it is accurate on inputs of Hamming weight 
at most log W . The theorem follows by setting k = log W in the statement of Corollary 15. 2i D 

6 Discussion 

We gave a differentially private online algorithm for answering k-way marginal queries that runs 
in time 2°^'^) per query, and guarantees accurate answers for databases of size poly{d,k). More 
precisely, we showed that if there exists a polynomial of degree t and weight W approximating 
the OR function on Boolean inputs of Hamming weight up to k, then a variant of the private 
multiplicative weights algorithm can answer k-way marginal queries in time roughly (j) per query 
and guarantee accurate answers on databases of size roughly Wvd. To this end, we gave a new 
construction showing the existence of polynomial approximations to the OR function on inputs of 
low Hamming weight. Specifically, we showed that polynomials of weight d'^^ and degree (fi-''(i/vfc) 
exist. 

Our algorithm for answering k-way marginals is essentially the same as in [HR10| . but using 
a different set of base functions (specifically, the set of all low-degree parities), which leads to 
an efficiency gain. We note that our algorithm degrades smoothly to the private multiplicative 
weights algorithm as the degree of the promised polynomial approximation increases, and never 
gives a worse running time. This behavior suggests that our algorithm may lead to practical 
improvements even for relatively small values of d, for which the asymptotic analysis does not 
apply. In such cases one might use an alternative but similar analysis that shows the existence of a 
polynomial of degree kd^~'^'^ and weight d'^ (for any < c < k) that exactly computes the d-variate 
OR function on inputs of Hamming weight at most k. Such a polynomial may be obtained as in 
our construction, by breaking the d-variate OR function into an OR of ORs, and using a degree 
k polynomial defined via polynomial interpolation, instead of a transformation of the Chebyshev 
polynomials, to approximate the outer OR on inputs of Hamming weight at most k. 

Our lower bounds show that our polynomial approximation to the OR^ function on inputs of 
Hamming weight k is close to the best possible; in particular, we cannot hope to improve the 
running time on poly(d, k) size databases by giving approximating polynomials with better weight 
and degree bounds. We do not know if it is possible to do better by using different feature spaces 
(other than the set of all low-degree monomials) to uniformly approximate all disjunctions over d 
variables. We leave this question as an interesting direction for future work. 

Acknowledgments. We thank Salil Vadhan for helpful discussions about this work. 
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